187 research outputs found

    An ontology enhanced parallel SVM for scalable spam filter training

    Get PDF
    This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

    Inferring short-term volatility indicators from Bitcoin blockchain

    Full text link
    In this paper, we study the possibility of inferring early warning indicators (EWIs) for periods of extreme bitcoin price volatility using features obtained from Bitcoin daily transaction graphs. We infer the low-dimensional representations of transaction graphs in the time period from 2012 to 2017 using Bitcoin blockchain, and demonstrate how these representations can be used to predict extreme price volatility events. Our EWI, which is obtained with a non-negative decomposition, contains more predictive information than those obtained with singular value decomposition or scalar value of the total Bitcoin transaction volume

    Towards a comprehensive C-budgeting approach of a coccolithophorid bloom in the Northern Bay of Biscay (June 2006)

    Full text link
    A biogeochemical multidisciplinary survey was carried out in the northern Bay of Biscay, in early June 2006, during which 14C-based primary production and calcification were determined as well as O2-based community respiration. Contemporary remote sensing images showed several patches of high reflectance (HR) in the investigated area. Based on remote sensing and in situ measured biogeochemical parameters, the area exhibited varying coccolithophorid bloom stages from its early development to the post-bloom stages. The major HR patch, characterizing a post-stationary stage of the bloom, was located between 48°N and 49°N over the shelf along the continental margin. It was associated with moderate chlorophyll-a levels, never exceeding 1.0 µg L-1, dissolved phosphorus and silica depletion, and undersaturation of CO2 with respect to atmospheric equilibrium. Considered as the main drivers of the C cycle in this area, the CO2 fluxes associated with primary production, calcification and respiration were integrated in order to provide a comprehensive C budget in the area

    L2-norm multiple kernel learning and its application to biomedical data fusion

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as <it>L</it><sub>∞</sub>, <it>L</it><sub>1</sub>, and <it>L</it><sub>2 </sub>MKL. In particular, <it>L</it><sub>2 </sub>MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing <it>L</it><sub>∞ </sub>MKL method. In real biomedical applications, <it>L</it><sub>2 </sub>MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources.</p> <p>Results</p> <p>We provide a theoretical analysis of the relationship between the <it>L</it><sub>2 </sub>optimization of kernels in the dual problem with the <it>L</it><sub>2 </sub>coefficient regularization in the primal problem. Understanding the dual <it>L</it><sub>2 </sub>problem grants a unified view on MKL and enables us to extend the <it>L</it><sub>2 </sub>method to a wide range of machine learning problems. We implement <it>L</it><sub>2 </sub>MKL for ranking and classification problems and compare its performance with the sparse <it>L</it><sub>∞ </sub>and the averaging <it>L</it><sub>1 </sub>MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. <it>L</it><sub>2 </sub>MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel <it>L</it><sub>2 </sub>MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing.</p> <p>Conclusions</p> <p>This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in <it>L</it><sub>∞ </sub>MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing <it>L</it><sub>2 </sub>kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL.</p> <p>Availability</p> <p>The MATLAB code of algorithms implemented in this paper is downloadable from <url>http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html</url>.</p

    Intelligent image-based colourimetric tests using machine learning framework for lateral flow assays

    Get PDF
    This paper aims to deliberately examine the scope of an intelligent colourimetric test that fulfils ASSURED criteria (Affordable, Sensitive, Specific, User-friendly, Rapid and robust, Equipment-free, and Deliverable) and demonstrate the claim as well. This paper presents an investigation into an intelligent image-based system to perform automatic paper-based colourimetric tests in real-time to provide a proof-of-concept for a dry-chemical based or microfluidic, stable and semi-quantitative assay using a larger dataset with diverse conditions. The universal pH indicator papers were utilised as a case study. Unlike the works done in the literature, this work performs multiclass colourimetric tests using histogram based image processing and machine learning algorithm without any user intervention. The proposed image processing framework is based on colour channel separation, global thresholding, morphological operation and object detection. We have also deployed a server based convolutional neural network framework for image classification using inductive transfer learning on a mobile platform. The results obtained by both traditional machine learning and pre-trained model-based deep learning were critically analysed with the set evaluation criteria (ASSURED criteria). The features were optimised using univariate analysis and exploratory data analysis to improve the performance. The image processing algorithm showed >98% accuracy while the classification accuracy by Least Squares Support Vector Machine (LS- SVM) was 100%. On the other hand, the deep learning technique provided >86% accuracy, which could be further improved with a large amount of data. The k-fold cross validated LS- SVM based final system, examined on different datasets, confirmed the robustness and reliability of the presented approach, which was further validated using statistical analysis. The understaffed and resource limited healthcare system can benefit from such an easy-to-use technology to support remote aid workers, assist in elderly care and promote personalised healthcare by eliminating the subjectivity of interpretation

    Rapid decline of the CO2 buffering capacity in the North Sea and implications for the North Atlantic Ocean

    Get PDF
    Author Posting. © American Geophysical Union, 2007. This article is posted here by permission of American Geophysical Union for personal use, not for redistribution. The definitive version was published in Global Biogeochemical Cycles 21 (2007): GB4001, doi:10.1029/2006GB002825.New observations from the North Sea, a NW European shelf sea, show that between 2001 and 2005 the CO2 partial pressure (pCO2) in surface waters rose by 22 μatm, thus faster than atmospheric pCO2, which in the same period rose approximately 11 μatm. The surprisingly rapid decline in air-sea partial pressure difference (ΔpCO2) is primarily a response to an elevated water column inventory of dissolved inorganic carbon (DIC), which, in turn, reflects mostly anthropogenic CO2 input rather than natural interannual variability. The resulting decline in the buffering capacity of the inorganic carbonate system (increasing Revelle factor) sets up a theoretically predicted feedback loop whereby the invasion of anthropogenic CO2 reduces the ocean's ability to uptake additional CO2. Model simulations for the North Atlantic Ocean and thermodynamic principles reveal that this feedback should be stronger, at present, in colder midlatitude and subpolar waters because of the lower present-day buffer capacity and elevated DIC levels driven either by northward advected surface water and/or excess local air-sea CO2 uptake. This buffer capacity feedback mechanism helps to explain at least part of the observed trend of decreasing air-sea ΔpCO2 over time as reported in several other recent North Atlantic studies.S. Doney and I. Lima were supported by NSF/ONR NOPP (N000140210370) and NASA (NNG05GG30G)

    Big Data and Causality

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Causality analysis continues to remain one of the fundamental research questions and the ultimate objective for a tremendous amount of scientific studies. In line with the rapid progress of science and technology, the age of big data has significantly influenced the causality analysis on various disciplines especially for the last decade due to the fact that the complexity and difficulty on identifying causality among big data has dramatically increased. Data mining, the process of uncovering hidden information from big data is now an important tool for causality analysis, and has been extensively exploited by scholars around the world. The primary aim of this paper is to provide a concise review of the causality analysis in big data. To this end the paper reviews recent significant applications of data mining techniques in causality analysis covering a substantial quantity of research to date, presented in chronological order with an overview table of data mining applications in causality analysis domain as a reference directory
    corecore